github– https://github.com/wendi-white/biol607_homework

knitr::opts_knit$set(root.dir = "/Users/wendiwhite/Desktop/Ecology/UMass Boston Masters/UMass Masters classes/bio stats 607/data") #setwd for rest of file
#libraries
library(ggplot2)
library(car)
## Loading required package: carData
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
## 
##     recode
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble  3.0.4.9000     ✓ purrr   0.3.4     
## ✓ tidyr   1.1.0          ✓ stringr 1.4.0     
## ✓ readr   1.3.1          ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x dplyr::recode() masks car::recode()
## x purrr::some()   masks car::some()
library(ggplot2)
library(emmeans)
library(readr)
library(brms)
## Loading required package: Rcpp
## Loading 'brms' package (version 2.14.0). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
## 
## Attaching package: 'brms'
## The following object is masked from 'package:stats':
## 
##     ar
library(tidybayes)
## 
## Attaching package: 'tidybayes'
## The following objects are masked from 'package:brms':
## 
##     dstudent_t, pstudent_t, qstudent_t, rstudent_t
library(ggdist)
## 
## Attaching package: 'ggdist'
## 
## The following objects are masked from 'package:brms':
## 
##     dstudent_t, pstudent_t, qstudent_t, rstudent_t
library(bayesplot)
## This is bayesplot version 1.7.2
## - Online documentation and vignettes at mc-stan.org/bayesplot
## - bayesplot theme set to bayesplot::theme_default()
##    * Does _not_ affect other ggplot2 plots
##    * See ?bayesplot_theme_set for details on theme setting
library(purrr)
library(modelr)
library(rsample)
## 
## Attaching package: 'rsample'
## The following object is masked from 'package:Rcpp':
## 
##     populate
library(boot)
## 
## Attaching package: 'boot'
## The following object is masked from 'package:car':
## 
##     logit
  1. Comparing Means To start with, let’s warm up with a simple one-way ANOVA model. This example, from Whitlock and Schluter chapter 15 question 22 looks at the mass of lodgepole pinecones from different habitats.

1.1. Load and plot the data. Choose a plot that not only shows the raw data, but also the means and SE or CI of those means. +1 EC if Michael thinks it’s fancy.

setwd("/Users/wendiwhite/Desktop/Ecology/UMass Boston Masters/UMass Masters classes/bio stats 607/data")

pinecones <- read.csv("/Users/wendiwhite/Desktop/Ecology/UMass Boston Masters/UMass Masters classes/bio stats 607/data/15q22LodgepolePineCones.csv")

ggplot(pinecones, 
       mapping=aes(x=habitat, y=conemass)) + #x hab y conemass
  stat_summary(color="red", size=1.3) + #plots the SE of the mean
  geom_boxplot(alpha=0.7) + #transparency of point
  theme_bw(base_size=17) #font size
## No summary function supplied, defaulting to `mean_se()`

1.2 Fit a model using least squares and evaluate all relevant assumptions. List them out as you test them. Can we use this model? If not, fix it. But if we can, no fix is needed!

pines_lm <- lm(conemass~habitat, data=pinecones)
#assumptions
plot(pines_lm, which=1)  #residuals vs fitted values

plot(pines_lm, which=2) #qq plot 

shapiro.test(residuals(pines_lm)) #test for normality.we do have normal disk
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(pines_lm)
## W = 0.93363, p-value = 0.2781
plot(pines_lm, which=4) #cooks distances vs row labels

plot(pines_lm, which=5) #plot of residuals against leverages 

1.2 How much variation is explained by your model?

88.51% of variation in cone mass is explained by habitat

summary(pines_lm) #summary of model
## 
## Call:
## lm(formula = conemass ~ habitat, data = pinecones)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.780 -0.405 -0.040  0.505  0.720 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               8.9000     0.2212  40.238 4.97e-15 ***
## habitatisland.present    -2.8200     0.3281  -8.596 1.01e-06 ***
## habitatmainland.present  -2.7800     0.3281  -8.474 1.18e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5418 on 13 degrees of freedom
## Multiple R-squared:  0.8851, Adjusted R-squared:  0.8675 
## F-statistic: 50.09 on 2 and 13 DF,  p-value: 7.787e-07

1.3 Show which means are different from each other. Are you correcting p-values? If so, how, and justify your choice.

Island absent vs island present and island absent vs mainland present habitats have means that are different from each other.

We don’t really need to correct the p-value as we can see it doesn’t really change any of our answers. AKA when we do add adjust= “none” it doesn’t change our answer.

pines_em <- emmeans(pines_lm, ~ habitat) #get least squares means for each habitat 
pines_em
##  habitat          emmean    SE df lower.CL upper.CL
##  island.absent      8.90 0.221 13     8.42     9.38
##  island.present     6.08 0.242 13     5.56     6.60
##  mainland.present   6.12 0.242 13     5.60     6.64
## 
## Confidence level used: 0.95
#compare the diff habitats using Tukey test
pines_cont <- contrast(pines_em, method = "tukey") %>%
    plot()+ #plot it with a xint of 0 to see diff
  geom_vline(xintercept=0, color= "red")
pines_cont

?knitr::knit

#test with no adjustment and find very little difference but not enough diff to show significant diff between 
#island present and mainland present
pines_cont_adj <- contrast(pines_em, method = "tukey", adjust = "none") %>%
  plot()+ #plot it with a xint of 0 to see diff
  geom_vline(xintercept=0, color= "red")
pines_cont_adj

  1. Comparing Means from Multiple Categories In a study from Rogers et al. (2020) link, the authors performed an experiment where they moved panels that had been colonized by invertebrates on a dock to a nearby rocky jetty where predators could access panels. To separate out the effects of changes in abiotic environment versus predation, they performed a factorial experiment, either caging or not caging panels and placing them either on the side of a cinder block or hanging on a piece of PVC attached to the block where predators would have little access (but weren’t entirely stopped). They then looked at change in total cover of invertebrates. Using this old data file dug off of my hard drive, let’s see what they found.

2.1. Load and plot the data. We are interested in change in percent cover. Choose a plot that not only shows the raw data, but also the means and SE or CI of those means. +1 EC if Michael thinks it’s fancy.

setwd("/Users/wendiwhite/Desktop/Ecology/UMass Boston Masters/UMass Masters classes/bio stats 607/data")
inverts <- read_csv("/Users/wendiwhite/Desktop/Ecology/UMass Boston Masters/UMass Masters classes/bio stats 607/data/fouling_transplant_data.csv")
## Parsed with column specification:
## cols(
##   Treatment = col_character(),
##   Caged = col_character(),
##   `Position On Block` = col_character(),
##   `botry init` = col_double(),
##   `botry fin` = col_double(),
##   `water init` = col_double(),
##   `water fin` = col_double(),
##   `diplo init` = col_double(),
##   `diplo fin` = col_double(),
##   `distap init` = col_double(),
##   `distap fin` = col_double(),
##   `bugula init` = col_double(),
##   `bugula fin` = col_double(),
##   `Initial Cover` = col_double(),
##   `Final Cover` = col_double(),
##   `Change in Cover` = col_double()
## )
inverts
## # A tibble: 32 x 16
##    Treatment Caged `Position On Bl… `botry init` `botry fin` `water init`
##    <chr>     <chr> <chr>                   <dbl>       <dbl>        <dbl>
##  1 DJHC      Caged Hanging                    13          13            3
##  2 DJHC      Caged Hanging                    11          11            5
##  3 DJHC      Caged Hanging                    21          21            7
##  4 DJHC      Caged Hanging                    24          22           23
##  5 DJHC      Caged Hanging                    14          12            9
##  6 DJHC      Caged Hanging                    28          29           13
##  7 DJHC      Caged Hanging                    14          17            5
##  8 DJHC      Caged Hanging                    38          37            8
##  9 DJHO      Open  Hanging                    17          10           10
## 10 DJHO      Open  Hanging                    35          34           20
## # … with 22 more rows, and 10 more variables: `water fin` <dbl>, `diplo
## #   init` <dbl>, `diplo fin` <dbl>, `distap init` <dbl>, `distap fin` <dbl>,
## #   `bugula init` <dbl>, `bugula fin` <dbl>, `Initial Cover` <dbl>, `Final
## #   Cover` <dbl>, `Change in Cover` <dbl>
inverts <- janitor::clean_names(inverts) #take out spaces and capitalization from col names
inverts
## # A tibble: 32 x 16
##    treatment caged position_on_blo… botry_init botry_fin water_init water_fin
##    <chr>     <chr> <chr>                 <dbl>     <dbl>      <dbl>     <dbl>
##  1 DJHC      Caged Hanging                  13        13          3         3
##  2 DJHC      Caged Hanging                  11        11          5         5
##  3 DJHC      Caged Hanging                  21        21          7         8
##  4 DJHC      Caged Hanging                  24        22         23        21
##  5 DJHC      Caged Hanging                  14        12          9         8
##  6 DJHC      Caged Hanging                  28        29         13        10
##  7 DJHC      Caged Hanging                  14        17          5         3
##  8 DJHC      Caged Hanging                  38        37          8         7
##  9 DJHO      Open  Hanging                  17        10         10         3
## 10 DJHO      Open  Hanging                  35        34         20        20
## # … with 22 more rows, and 9 more variables: diplo_init <dbl>, diplo_fin <dbl>,
## #   distap_init <dbl>, distap_fin <dbl>, bugula_init <dbl>, bugula_fin <dbl>,
## #   initial_cover <dbl>, final_cover <dbl>, change_in_cover <dbl>
ggplot(inverts, #plot data to look at change in cover across treatments
       mapping=aes(x=treatment, y=change_in_cover,
                   color=position_on_block)) +
  stat_summary(color="black", size=0.8) + #plots the SE of the mean
  geom_point(alpha=0.7) + #tranparency of point
  theme_bw(base_size=17) #font size
## No summary function supplied, defaulting to `mean_se()`

2.2 Fit a model using likelihood and evaluate all relevant assumptions. Do you meet assumptions?

NOPE!! Specifically qq plot seems wonky, residuals vs fitted are pretty scattered, and Shapiro gives us a 0.01687 output which says we don’t have a good model… we are rejecting the null that says we a normal distribution.

#use glm b/c looking at likelihood
inverts_glm <- glm(change_in_cover ~ caged*position_on_block, 
                   data = inverts,
                   family = gaussian(link="identity"))#create glm of our data that looks at the interaction of caged and position on block's influence on %cover

plot(inverts_glm, which=1)  #residuals vs fitted values

plot(inverts_glm, which=2) #qq plot 

shapiro.test(residuals(inverts_glm)) #test for normality 
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(inverts_glm)
## W = 0.91664, p-value = 0.01687
plot(inverts_glm, which=4) #cooks distances vs row labels

2.3 If you answered yes to the above…. you are wrong. It doesn’t! Percentage data is weird. Difference in percentages can be ever weirder! There are three tried and true solutions here. But they MIGHT not all work.

  Incorporate initial cover as a covariate. This takes out that influence, and as such we’re looking at residuals of change. This sometimes, but not always, works.
inverts_glm_int <- glm(change_in_cover ~ caged*position_on_block + initial_cover, 
                       data = inverts,
                       family = gaussian(link="identity")) #create glm of our data that looks at the interaction of caged and position on block's influence on %cover and also accounts for the influence that initial cover has on this interaction


plot(inverts_glm_int, which=1)  #residuals vs fitted values look more scattered

plot(inverts_glm_int, which=2) #qq plot looks much better

shapiro.test(residuals(inverts_glm_int)) #test for normality reject null
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(inverts_glm_int)
## W = 0.90022, p-value = 0.006273
plot(inverts_glm_int, which=4) #cooks distances vs row labels looks good

  Divide change by initial cover to express change as percent change relative to initial cover.
inverts_change_cov <- inverts %>%
  mutate(change_in_cover = (change_in_cover / initial_cover)) #mutate to make new col looking at % change in cover
inverts_change_cov
## # A tibble: 32 x 16
##    treatment caged position_on_blo… botry_init botry_fin water_init water_fin
##    <chr>     <chr> <chr>                 <dbl>     <dbl>      <dbl>     <dbl>
##  1 DJHC      Caged Hanging                  13        13          3         3
##  2 DJHC      Caged Hanging                  11        11          5         5
##  3 DJHC      Caged Hanging                  21        21          7         8
##  4 DJHC      Caged Hanging                  24        22         23        21
##  5 DJHC      Caged Hanging                  14        12          9         8
##  6 DJHC      Caged Hanging                  28        29         13        10
##  7 DJHC      Caged Hanging                  14        17          5         3
##  8 DJHC      Caged Hanging                  38        37          8         7
##  9 DJHO      Open  Hanging                  17        10         10         3
## 10 DJHO      Open  Hanging                  35        34         20        20
## # … with 22 more rows, and 9 more variables: diplo_init <dbl>, diplo_fin <dbl>,
## #   distap_init <dbl>, distap_fin <dbl>, bugula_init <dbl>, bugula_fin <dbl>,
## #   initial_cover <dbl>, final_cover <dbl>, change_in_cover <dbl>
inverts_glm_change_cov <- glm(change_in_cover ~ caged*position_on_block, 
                              data = inverts_change_cov,
                              family = gaussian(link="identity"))#create lm of our transformed data that looks at the interaction of caged and position on block's influence on %cover

plot(inverts_glm_change_cov, which=1)  #residuals vs fitted values

plot(inverts_glm_change_cov, which=2) #qq plot better than original but not great

shapiro.test(residuals(inverts_glm_change_cov)) #test for normality and fail to reject null
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(inverts_glm_change_cov)
## W = 0.95055, p-value = 0.1494
plot(inverts_glm_change_cov, which=4) #cooks distances vs row labels. cooks distance is way smaller

  Calculate difference in logit cover (so, logit(initial cover) - logit(final cover)). Logit transformations linearize percent cover data, and are often all that is needed to work percent cover into a linear model. You can use car::logit() for this.
  
inverts_change_logist <- inverts %>%
  mutate(change_logit_cover = car::logit(initial_cover) - car::logit(final_cover)) #use logit transformation to linearize percent cover data

inverts_glm_logit <- glm(change_logit_cover ~ caged*position_on_block, 
                         data = inverts_change_logist,
                         family = gaussian(link="identity")) #create lm of our transformed data that looks at the interaction of caged and position on block's influence on %cover

plot(inverts_glm_logit, which=1)  #residuals vs fitted values is better

plot(inverts_glm_logit, which=2) #qq plot is looking wonky

shapiro.test(residuals(inverts_glm_logit)) #test for normality reject null
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(inverts_glm_logit)
## W = 0.9097, p-value = 0.01104
plot(inverts_glm_logit, which=4) #cooks distances vs row labels. cooks looks smaller

  Try all three methods. Which one works so that you can produce valid inference?

I would argue that dividing change by initial cover to express change as percent change relative to initial cover is the best b/c the plots look good and the shaprio test gives us a higher p-value. Incorporating the initial cover as a covariate is also a very good option. When we look at cooks distance it looks like there’s one data point in particular that is impacting this from being the better option. It looks good other than the p-value on the shaprio test is lower.

2.4 Great! So, take us home! Using NHST with an alpha of 0.08 (why not), what does this fit model tell you about whether predation matters given how I have described the system? Feel free to replot the data or fit model results if helpful

The Anova comparison (for both option 1 and 2) tells us that caged and position on block both impact the number of inverts.

#method 1
Anova(inverts_glm_int) 
## Analysis of Deviance Table (Type II tests)
## 
## Response: change_in_cover
##                         LR Chisq Df Pr(>Chisq)  
## caged                     4.5133  1    0.03363 *
## position_on_block         6.1491  1    0.01315 *
## initial_cover             1.3319  1    0.24847  
## caged:position_on_block   2.0384  1    0.15337  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#method 2
Anova(inverts_glm_change_cov)
## Analysis of Deviance Table (Type II tests)
## 
## Response: change_in_cover
##                         LR Chisq Df Pr(>Chisq)  
## caged                     4.6258  1    0.03149 *
## position_on_block         4.8150  1    0.02821 *
## caged:position_on_block   1.4525  1    0.22813  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Comparing Means with Covariates We will wrap up with a model mixing continuous and discrete variables. In this dataset from Scantlebury et al, the authors explored how caste and mass affected the energy level of naked mole rats.

3.1 OK, you know what you are about at this point. Load in the data, plot it, fit it, check assumptions. Use Bayes for this.

setwd("/Users/wendiwhite/Desktop/Ecology/UMass Boston Masters/UMass Masters classes/bio stats 607/data")
rats <- read_csv("/Users/wendiwhite/Desktop/Ecology/UMass Boston Masters/UMass Masters classes/bio stats 607/data/18e4MoleRatLayabouts.csv")
## Parsed with column specification:
## cols(
##   caste = col_character(),
##   lnmass = col_double(),
##   lnenergy = col_double()
## )
summary(rats)
##     caste               lnmass         lnenergy    
##  Length:35          Min.   :3.850   Min.   :3.555  
##  Class :character   1st Qu.:4.248   1st Qu.:3.902  
##  Mode  :character   Median :4.511   Median :4.190  
##                     Mean   :4.540   Mean   :4.193  
##                     3rd Qu.:4.844   3rd Qu.:4.489  
##                     Max.   :5.263   Max.   :5.043
ggplot(data= rats,#plot data of change in mass across energy and sort by caste
       mapping = aes(x=lnenergy,
                     y=lnmass,
                     color=caste)) + 
  geom_point(alpha=0.7) + #transparency of point
  theme_bw(base_size=17) #size of text

rat_brm <- brm(lnenergy~ lnmass + caste, #bayes model looking at the impacts of mass on energy while also considering caste
               family = gaussian(link="identity"),
               data=rats,
               chain= 3)
## Compiling Stan program...
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DBOOST_NO_AUTO_PTR  -include '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/usr/local/include   -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
## Start sampling
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 1.7e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.026302 seconds (Warm-up)
## Chain 1:                0.024037 seconds (Sampling)
## Chain 1:                0.050339 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 9e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.024379 seconds (Warm-up)
## Chain 2:                0.025849 seconds (Sampling)
## Chain 2:                0.050228 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 7e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.024287 seconds (Warm-up)
## Chain 3:                0.023812 seconds (Sampling)
## Chain 3:                0.048099 seconds (Total)
## Chain 3:
plot(rat_brm) #look at the bayes plot

rhat(rat_brm)%>%
  mcmc_rhat()#can see all values are close to 1

pp_check(rat_brm, "dens_overlay") #green is posteriors from fit
## Using 10 posterior samples for ppc type 'dens_overlay' by default.

                #black is dist of length in bayes
                #looks like a good fit!


#checking our residuals
pp_check(rat_brm, "error_hist") #residuals look good!
## Using 10 posterior samples for ppc type 'error_hist' by default.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

pp_check(rat_brm, "error_scatter") #relationship 
## Using 10 posterior samples for ppc type 'error_scatter' by default.

        #between residuals and obs values.. see that it's  
        #generally linear so we're good!
pp_check(rat_brm, "error_scatter_avg") #takes avg error
## Using all posterior samples for ppc type 'error_scatter_avg' by default.

          #over all posteriors

#no auto correlation 
mcmc_acf(rat_brm)

3.2 Examine whether there is an interaction or not using LOO cross-validation. Is a model with an interaction more predictive?

There is no clear interaction of caste and mass b/c loo_compare shows that rat_brm is the better model.

rat_brm_int <- brm(lnenergy~ caste* lnmass, #bayes model looking at the impacts of caste, mass and the interaction of both on energy
               family = gaussian(link="identity"),
               data=rats,
               chain= 3)
## Compiling Stan program...
## recompiling to avoid crashing R session
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DBOOST_NO_AUTO_PTR  -include '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/usr/local/include   -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
## Start sampling
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 1.7e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.189229 seconds (Warm-up)
## Chain 1:                0.220427 seconds (Sampling)
## Chain 1:                0.409656 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 1.1e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.171859 seconds (Warm-up)
## Chain 2:                0.168358 seconds (Sampling)
## Chain 2:                0.340217 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 1e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.2016 seconds (Warm-up)
## Chain 3:                0.203216 seconds (Sampling)
## Chain 3:                0.404816 seconds (Total)
## Chain 3:
rat_brm_loo <- loo(rat_brm) #leave one out inference for the rat bayes model
rat_brm_int_loo <- loo(rat_brm_int) #leave one out inference for the rat bayes model that includes the interaction

loo_compare(rat_brm_loo, rat_brm_int_loo) #compare to see which model is correct
##             elpd_diff se_diff
## rat_brm      0.0       0.0   
## rat_brm_int -0.4       0.8
#see that rat_brm is 0 which means we select this one. We don't pick model with interaction of caste and lnmass

3.3 Compare the two castes energy expenditure at the mean level of log mass. Are they different? How would you discuss your conclusions.

Our 95% confidence intervals are very diff. Our lower worker CI and upper lazy do not overlap which tells us that there is a very diff energy expenditure between lazy and worker.

emmeans(rat_brm, ~caste, method="tukey") #compare means of bayes model looking at energy expenditure between lazy and worker
##  caste  emmean lower.HPD upper.HPD
##  lazy     3.96      3.76      4.19
##  worker   4.35      4.18      4.51
## 
## Point estimate displayed: median 
## HPD interval probability: 0.95
emmeans(rat_brm, ~caste) %>% #do it as a comparison between the two castes
  contrast(method="tukey")
##  contrast      estimate lower.HPD upper.HPD
##  lazy - worker   -0.396    -0.706   -0.0913
## 
## Point estimate displayed: median 
## HPD interval probability: 0.95

3.4 Plot the fit model. Use tidybayes and ggdist with your model to show fit and credible intervals with the raw data points on top. modelr::data.grid() might help as well.

rat_brm <- brm(lnenergy~ lnmass + caste, #bayes model looking at the impacts of mass on energy while also considering caste
               family = gaussian(link="identity"), #gaussian dist and identity for link
               data=rats,
               chain= 3)
## Compiling Stan program...
## recompiling to avoid crashing R session
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DBOOST_NO_AUTO_PTR  -include '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/usr/local/include   -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
## Start sampling
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 1.7e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.02213 seconds (Warm-up)
## Chain 1:                0.019426 seconds (Sampling)
## Chain 1:                0.041556 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 8e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.02447 seconds (Warm-up)
## Chain 2:                0.020556 seconds (Sampling)
## Chain 2:                0.045026 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 7e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.022263 seconds (Warm-up)
## Chain 3:                0.019162 seconds (Sampling)
## Chain 3:                0.041425 seconds (Total)
## Chain 3:
rat_newdat <- modelr::data_grid(rats, #do sequence of 100 over range of lnmass per each caste
                                  caste = unique(caste),
                                  lnmass= seq_range(lnmass, n=100))

rat_predict <- predict(rat_brm, #look at the response and CI that we get from our bayes model  and add those predictions to the rat_newdat.
                         newdata= rat_newdat,
                         type= "response",
                         interval= "confidence")

rat_newdat <- rat_newdat %>%
  mutate(lnenergy= rat_predict) #add a new column with the CI

#find predictions of the data and create new data frame with the means
rat_newfit <- emmeans(rat_brm, specs = ~caste +lnmass, #take the means of the data set across 100 of the mass data points
                        at = list(lnmass = seq(4,5, length.out = 100))) %>%
  as_tibble() %>%
  mutate(lnenergy= emmean)


ggplot(data= rats,#take origional graph of data looking at how mass changes energy for diff castes
       aes(x = lnmass,
           y = lnenergy,
           color= caste)) +
  geom_point() + #plot the origional data points
  geom_ribbon(data= rat_newfit, #use geom ribbon to set the upper and lower CI
              aes(ymin= lower.HPD, ymax= upper.HPD, group=caste),
              alpha = 0.1, color = "lightgrey")+ #set CI as light grey and transparency of 0.1 
  theme_classic()